| Approach | Distribution | Moments |
|---|---|---|
| Analytical | Variable Transform | Analytical Moments (Kalman Filter) |
| Taylor Series (Extended Kalman Filter) | ||
| Numerical | Monte Carlo (Particle Filter) | Ensemble (Ensemble Kalman Filter) |
Jasper Slingsby
Uncertainty determines the utility of a forecast:
If the uncertainty in a forecast is too high, then it is of no utility to a decision maker.
If the uncertainty is not properly quantified and presented, it can lead to poor decisions.
This leaves forecasters with four overarching questions:
The utility of a model/forecast depends on:
combined with
Together these determine the “ecological forecast horizon” (Petchey et al. (2015)).
The ecological forecast horizon (from Petchey et al. (2015)).
Some forecasts may lose proficiency very quickly, crossing (or starting below) the forecast proficiency threshold. If the forecast loses proficiency more slowly, or the proficiency threshold requirements are lower, the forecast horizon is further into the future.
Dietze classifies prediction uncertainty in his book (Dietze 2017a) and subsequent paper (Dietze 2017b) in the form of an equation (note that I’ve spread it over multiple lines):
\[ \underbrace{Var[Y_{t+1}]}_\text{predictive variance} \approx \; \underbrace{stability*uncertainty}_\text{initial conditions} \; + \\ \] \[ \underbrace{sensitivity*uncertainty}_\text{drivers} \; + \\ \] \[ \underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} \; + \\ \] \[ \underbrace{Var[\epsilon]}_\text{process error} \; \; \]
If we break the terms down into (something near) English, we get:
The dependent variable:
\[Var[Y_{t+1}] \approx\]
“The uncertainty in the prediction for the variable of interest (\(Y\)) in the next time step (\(t+1\)) is approximately equal to…”
And now the independent variables (or terms in the model):
\[\underbrace{stability*uncertainty}_\text{initial conditions} \; +\]
“The stability multiplied by the uncertainty in the initial conditions, plus”
\[\underbrace{sensitivity * uncertainty}_\text{drivers} \; + \]
“The sensitivity to, multiplied by the uncertainty in, external drivers, plus”
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
\[\underbrace{sensitivity*(uncertainty+variability)}_\text{(parameters + random effects)} + \]
“The sensitivity to, multiplied by uncertainty and variability in, the parameters, plus”
\[\underbrace{Var[\epsilon]}_\text{process error}\] “The process error.”
There are many methods, but it’s worth recognizing that these are actually two steps:
This could be a lecture series of its own. In short, there are 5 methods to address step 1, and most have related methods for step 2 (see Table in notes). The methods differ in whether they:
They also have trade-offs between efficiency vs flexibility. The most efficient have the most rigid requirements and assumptions (analytical), while the most flexible (numeric) can be computationally taxing (or impossible given a complex enough model).
| Approach | Distribution | Moments |
|---|---|---|
| Analytical | Variable Transform | Analytical Moments (Kalman Filter) |
| Taylor Series (Extended Kalman Filter) | ||
| Numerical | Monte Carlo (Particle Filter) | Ensemble (Ensemble Kalman Filter) |
Note: It is possible to propagate uncertainty through the model and into your forecast in one step with Bayesian methods, by treating the forecast states as “missing data” values and estimating posterior distributions for them. This would essentially fit with Monte Carlo methods in the table. This approach may not suit all forecasting circumstances though.
Working out where it’s coming from (by analyzing and partitioning the sources of uncertainty).
Targeting sources of uncertainty that can be reduced with the best return on investment (important to note that these may not be the biggest sources of uncertainty, just the cheapest and easiest to resolve).
Addressing 1 requires looking at the two ways in which things can be important for the uncertainty in predictions (largely covered in Dietze’s equation above):
Addressing 2 may not be as straightforward as you’d hope. Parameters that are highly uncertain and to which your state variable (Y) are highly sensitive will cause the most uncertainty in your predictions. That said, given limited resources, they may not be the best target for reducing uncertainty for a number of reasons, e.g.
In fact, by this stage you should have most of the pieces of the puzzle to help you build a model to predict where your effort is best invested by exploring the relationship between sample size and variance contribution to overall model uncertainty! You can even include economic principles to estimate monetary or person-hour implications. This is called observational design.
Bayes’ Rule:
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The posterior is proportional to the likelihood times the prior.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The posterior is the conditional probability of the parameters given the data \(p(\theta|D)\) and provides a probability distribution for the values any parameter can take,
This allows us to represent uncertainty in the model and forecasts as probabilities, which is powerful for indicating the probability of our forecast being correct.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The likelihood \(p(D|\theta)\) represents the probability of the data \(D\) given the model with parameter values \(\theta\), and is used in analyses to find the likelihood profiles of the parameters.
This term looks for the best estimate of the parameters using Maximum Likelihood Estimation, where the likelihood of the parameters are maximized for a given model by choosing the parameters that maximize the probability of the data.
\[ \underbrace{p(\theta|D)}_\text{posterior} \; \propto \; \underbrace{p(D|\theta)}_\text{likelihood} \;\; \underbrace{p(\theta)}_\text{prior} \; \]
The prior is the marginal probability of the parameters, \(p(\theta)\).
It represents the credibility of the parameter values, \(\theta\), without the data, and is specified using our prior belief of what the parameters should be, before interrogating the data. This provides a formal probabilistic framework for the scientific method, in that new evidence must be considered in the context of previous knowledge, providing the opportunity to update our beliefs.
Data can enter (or be fused with) a model in a variety of ways. Here we’ll discuss these and then give an example of the Fynbos postfire recovery model used in the practical.
The opportunities for data fusion are linked to model structure, so we’ll revisit how some aspects of model structure change as we move from Least Squares to Maximum Likelihood Estimation to “single-level” Bayes to Hierarchical Bayes and the data fusion opportunities provided by each.
Conceptually (and perhaps over-simplistically), one can think of the changes in model structure as being the addition of model layers, each of which provide more opportunities for data fusion.
Least Squares makes no distinction between the process model and the data model.
the process model models the drivers determining the pattern observed (i.e. is the model equation you will be familiar with, such as a linear model)
a data model models the observation error or data observation process, i.e. the factors that may cause mismatch between the process model and the data
in least squares the data model can only ever be a normal (also called Gaussian) distribution, because we require homogeneity of variance in order to minimize the sums of squares
the only opportunity to add data to a least squares model is via the process model